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Abstract —This work proposes a decentralized, iterative, 
Bayesian algorithm called CB-DSBL for in-network estimation 
of multiple jointly sparse vectors by a network of nodes, using 
noisy and nnderdetermined linear measurements. The proposed 
algorithm exploits the network wide joint sparsity of the nn- 
known sparse vectors to recover them from significantly fewer 
number of local measurements compared to standalone sparse 
signal recovery schemes. To reduce the amount of inter-node 
communication and the associated overheads, the nodes exchange 
messages with only a small subset of their single hop neighbors. 
Under this commnnication scheme, we separately analyze the 
convergence of the underlying Alternating Directions Method of 
Multipliers (ADMM) iterations used in onr proposed algorithm 
and establish its linear convergence rate. The findings from 
the convergence analysis of decentralized ADMM are used to 
accelerate the convergence of the proposed CB-DSBL algorithm. 
Using Monte Carlo simulations, we demonstrate the superior 
signal reconstrnction as well as support recovery performance 
of onr proposed algorithm compared to existing decentralized 
algorithms: DRL-1, DCOMP and DCSP. 

Index Terms —Decentralized Estimation, Distribnted Compres¬ 
sive Sensing, Joint Sparsity, Sparse Bayesian Learning, Sensor 
Networks. 


I. Introduction 

We consider the problem of in-network estimation of mul¬ 
tiple joint-sparse vectors by a network of connected agents 
or processing nodes, using noisy and underdetermined linear 
measurements. Two or more vectors in K" are called joint- 
sparse if, in addition to each vector being individually sparse^ 
their nonzero coefficients belong to a common index set. Joint 
sparsity occurs naturally in scenarios involving multiple agents 
trying to learn a sparse representation of a common physical 
phenomenon. Since the underlying physical phenomenon is the 
same for all the agents (with similar acquisition modalities), 
their individual sparse representations/model parameters tend 
to exhibit joint sparsity. In this work, we consider joint- 
sparse vectors which belong to Type-2 Joint Sparse Model 
0 or JSM-2, one of the three generative models for joint- 
sparse signals. JSM-2 signal vectors satisfy the property that 
their nonzero coefficients are uncorrelated within and across 
the vectors. JSM-2 has been successfully used in several 
applications such as cooperative spectrum sensing 
decentralized event detection 0^ 0^ multi-task compressive 
sensing ||7) and MIMO channel estimation 

This work has appeared in part in 

* A vector in R” is said to be fc-sparse if only k{<^ n) out of its n 
coefficients are nonzero. 


To further motivate the signal structure of joint spar¬ 
sity in a distributed setup, consider the problem of detec- 
tion/classihcation of randomly occurring events in a held by 
multiple sensor nodes. Each sensor node j, 1 < j < L, em¬ 
ploys a dictionary T'j = whose each column 

ijjj is the signature corresponding to the z* event, one out of 
the c events which can potentially occur. In many cases, due 
to the inability to accurately model the sensing process, the 
signature vectors ipj are simply chosen to be the past record¬ 
ings of j* sensor corresponding to standalone occurrence of 
the z* event, averaged across multiple experiments 0. This 
procedure can result in a dictionary whose columns are highly 
correlated. Thus, for any k (<C c) events occurring simulta¬ 
neously, a noisy sensor recording might belong to multiple 
subspaces, each spanned by different subsets of columns of 
the local dictionary. In such a scenario, enforcing joint sparsity 
across the sensor nodes can resolve the ambiguity in selecting 
the correct subset of columns at each sensor node. 

In this work, we consider a distributed setup where each 
individual joint-sparse vector is estimated by a distinct node in 
a network comprising multiple nodes, with each node having 
access to noisy and underdetermined linear measurements of 
its local sparse vector. By collaborating with each other, these 
nodes can exploit the underlying joint sparsity of their local 
sparse vectors to reduce the measurements required per node 
or improve the quality of their local signal estimates. In 0, 
it has been shown that the number of local measurements 
required for common support recovery can be dramatically 
reduced by exploiting the joint sparsity structure prevalent 
across the network. In fact, as the nodes increase in number, 
exact signal reconstruction is possible from as few as k 
measurements per node, where k denotes the size of the 
support set. Such a substantial reduction in the number of 
measurements is highly desirable, especially in applications 
where the cost or time required to acquire new measurements 
is high. 

Distributed algorithms for JSM-2 signal recovery come in 
two flavors - centralized and decentralized. In the centralized 
approach, each node transmits its local measurements to a 
fusion center (FC) which runs a joint-sparse signal recovery 
algorithm. The FC then transmits the reconstructed sparse 
signal estimates back to their respective nodes. In contrast, 
in a decentralized approach, the goal is to obtain the same 
solution as with the centralized scheme at all nodes by 
allowing each node to exchange information with its single hop 
neighbors in addition to processing its local measurements. 
Besides being inherently robust to node failures, decentralized 
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schemes also tend to be more energy efficient as the inter-node 
communication is restricted to relatively short ranges covering 
only one hop communication links. In this work, we focus 
on the decentralized approach for solving the sparse signal 
recovery problem under the JSM-2 signal model. 


A. Related Work 


In this subsection, we briefly summarize the existing central¬ 
ized and decentralized algorithms for JSM-2 signal recovery. 
The earliest work on joint-sparse signal recovery considered 
extensions of recovery algorithms meant for single measure¬ 
ment vector setup to the centralized multiple measurement 
vector (MMV) model and demonstrated the signihcant 
performance gains that are achievable by exploiting the joint 
sparsity structure. MMV Basic Matching Pursuit (M-BMP), 
MMV Orthogonal Matching Pursuit (M-OMP) and MMV FO- 
cal Underdetermined System Solver (M-FOCUSS), introduced 
in belong to this category. In p^ , joint sparsity was ex¬ 
ploited for distributed encoding of multiple sparse signals. This 
work generalized the joint-sparse signals as being generated 
according to one of the three joint-sparse signal models (JSM- 
1,2,3). This work also proposed a centralized greedy algorithm 
called Simultaneous Orthogonal Matching Pursuit (SOMP) Q 
for JSM-2 recovery. In | [T3) , Alternating Directions Method for 
MMV setup (ADM-MMV) was proposed which used an £ 2/^1 
mixed norm penalty to promote a joint-sparse solution. In 
0, the multiple response sparse Bayesian learning (M-SBL) 
algorithm was proposed as an MMV extension of the SBL 
algorithm GD- Unlike the algorithms discussed earlier, M- 
SBL adopts a probabilistic approach by seeking the maximum 
a posterior probability (MAP) estimate of the JSM-2 signals. 
In M-SBL, a joint-sparse solution is encouraged by assuming 
a joint sparsity inducing parameterized prior on the unknown 
sparse vectors, with the prior parameters learnt directly from 
the measurements. M-SBL has been shown to outperform 
deterministic methods based on norm relaxation such as M- 
BMP and M-FOCUSS fTT) a s well as greedy algorithms such 
as SOMP. AMP-MMV |16| is another Bayesian algorithm 
which uses approximate message passing (AMP) to obtain 
marginalized conditional posterior distributions of joint-sparse 
signals. Owing to their low computational complexity, AMP 
based algorithms are suitable for recovering signals with large 
dimensions. However, they have been shown to converge only 
for large dimensional and randomly constructed measurement 
matrices. Interested readers are referred to G3 for an excellent 
study comparing some of the aforementioned centralized JSM- 
2 signal recovery algorithms. 

Among decentralized algorithms, collaborative orthogonal 
matching pursuit (DCOMP) jTS) and collaborative subspace 
pursuit (DCSP) |19| are greedy algorithms for JSM2 signal 


recovery, and both are computationally very fast. However, 
as demonstrated later in this paper, they do not perform 
as well as regularization based methods which induce joint 
sparsity in their solution by employing a suitable penalty or 
indirectly via a joint signal prior. Moreover, both DCOMP and 
DCSP assume a priori knowledge of the size of the nonzero 
support set, which could be unknown or hard to estimate. 


Decentralized row-based LASSO (DR-LASSO) | [20) is an 
iterative alternating minimization algorithm which optimizes 
a non-convex objective with fi -^2 mixed norm based regu¬ 
larization to obtain a joint-sparse solution. Decentralized re¬ 
weighted ^ 1 (^ 2 ) minimization algorithms DRL-1,2 ||^ employ 
a non-convex sum-log-sum penalty to promote a joint-sparse 
solution. Although non-convex regularizers induce sparsity 
much more strongly as compared to convex £i norm based 
regularizers the resulting non-convex optimization can be 
difficult to solve efficiently. In DRL-1/2, the non-convex ob¬ 
jective is replaced by a surrogate convex function constructed 
from iteration dependent weighted £i/i 2 norm terms. Using 
a non-convex sum-log-sum regularization results in a more 
sparse solution compared to convex regularization used in DR- 
LASSO. However, both DR-LASSO and DRL-1,2 necessitate 
cross validation to tune the amount of regularization needed for 
optimal support recovery performance. DRL-1,2 also requires 
proper tuning of a so-called smoothing parameter and an 
ADMM parameter for its optimal performance. By employing 
a Bayesian approach,we can completely eliminate any need 
for cross validation, by learning the parameters of a family of 
signal priors, such that selected signal prior has maximum 
Bayesian evidence. DCS-AMP Q is one such decentral¬ 
ized algorithm which employs approximate message passing 
to learn a parameterized joint sparsity inducing Bernoulli- 
Gaussian signal prior. Turbo Bayesian Compressive Sensing 
(Turbo-BCS) | |22) , another decentralized algorithm, adopts 
a more relaxed zero mean Gaussian signal prior, with the 
variance hyperparameters themselves distributed according to 
an exponential distribution. This relaxation of signal prior 
results in improved MSE without compromising on sparsity of 
the solution. Turbo-BCS, however, involves direct exchange of 
signal estimates between the nodes, which renders it unsuitable 
for applications where it is necessary to preserve the privacy 
of the local signals. 

B. Contributions 

Our main contributions in this work are as follows: 

1) We propose a novel decentralized, iterative, Bayesian 
joint-sparse signal recovery algorithm called Consensus 
Based Distributed Sparse Bayesian Learning or CB- 
DSBL. CB-DSBL works by establishing network wide 
consensus with respect to the estimated parameters of a 
joint sparsity inducing signal prior. The learnt signal prior 
is subsequently used by the individual nodes to obtain 
MAP estimates of local sparse signal vectors by the 
individual nodes. The proposed CB-DSBL algorithm does 
not require direct exchange of either local measurements 
or signal estimates between the nodes and hence is well 
suited for applications where it is important to preserve 
the privacy of the local signal coefficients. 

2) The proposed algorithm employs the Alternating Direc¬ 
tions Method of Multipliers (ADMM) to solve a series 
of iteration dependent consensus optimization problems 
which require the nodes to exchange messages with each 
other. To reduce the associated communication overheads, 
we adopt a bandwidth efficient inter-node communica¬ 
tion scheme. This scheme entails the nodes exchanging 
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TABLE I: Comparison of Decentralized Joint-Sparse Signal Recovery Algorithms 


Decentralized 

algorithm 

Per node, per iteration 
computational complexity 

Per node, per 
iteration 
communication 
complexity 

Privacy of 
local signal 
estimates 

Tunable 
parameters 
(if any) 

Assumes 
a priori 
knowledge of 
sparsity level 

DCSP fl9] 

0(mn (^n k log n + m^) 

0((^n + k log n) 

Yes 

None 

Yes 

DCOMF^ 

OinC + L) 

0{Cn + L) 

Yes 

None 

Yes 

DRL-1 jsj 

0((n^ + + nrA)rn,m + (n) 

0(Cn) 

Yes 

Yes 

No 

DR-LaMj j20| 

Oin^mTi + CT 1 T 2 ) 

0{CnT2) 

Yes 

Yes 

No 

Turbo-BCS I 22 T 

0{n^ + nL + nk^ k^ mk) 

0(kL) 

No 

None 

No 

DCS-AMP 1^ 

0{mn + (^n + cin) 

0(Cn) 

Yes 

Yes 

No 

CB-DSBL (proposed) 

0(n^ + + nrA + Cnr„a!i) 

0(Cnr„ua) 

Yes 

None 

No 


1. n, m. k and L stand for the dimension of unknown sparse vector, number of local measurements per node, number of nonzero coefficients 
in the tnae support and network size, respectively. 

2. is the maximum number of communication links activated per node, per communication round. 

3. r’max is the number of inner loop ADMM iterations executed per CB-DSBL iteration. 

4. Tmax is also the number of ADMM iterations used to obtain an inexact solution to the weighted ii norm based subproblem 
in the inner loop of DRL-1. 

5. Ti and T 2 denote the number of iterations of the two different inner loop iterations executed per DR-LASSO iteration. 


messages with only a predesignated subset of its single 
hop neighbors known as bridge nodes, as motivated in 
| [23| . By selecting these bridge nodes, one can trade off 
between communication bandwidth requirements and the 
ADMM’s robustness to node failures. In this connection, 
we analytically establish the relationship between the 
selected set of bridge nodes and the convergence rate of 
the ADMM iterations. For the bridge-node based inter¬ 
node communication scheme, we show linear rate of 
convergence for the ADMM iterations when applied to 
a generic consensus optimization problem. The analysis 
is useful in obtaining a closed form expression for the 
tunable parameter of our proposed joint sparse signal 
recovery algorithm, ensuring its fast convergence. 

3) We empirically demonstrate the superior MSE and sup¬ 
port recovery performance of CB-DSBL in comparison to 
existing decentralized algorithms: DRL-1, DCOMP and 
DCSP. 


In Table we compare the existing decentralized joint-sparse 
signal recovery schemes with respect to their per iteration 
computational and communication complexity, privacy of local 
estimates, presence/absence of tunable parameters and depen¬ 
dence on prior knowledge of the sparsity level. As highlighted 
in the comparison in Table [I] CB-DSBL belongs to a handful 
of decentralized algorithms for joint-sparse signal recovery 
which do not require a priori knowledge of the sparsity 
level, rely only on single hop communication, and do not 
involve direct exchange of local signal estimates between 
network nodes. Besides this, unlike loopy Belief Propagation 
(BP) or Approximate Message Passing (AMP) based Bayesian 
algorithms, CB-DSBL does not suffer from any convergence 
issues even when the local measurement matrix at each node 
is dense or not randomly constructed. 

The rest of this paper is organized as follows. Section [I^ 
describes the system model and the problem statement of dis¬ 
tributed JSM-2 signal recovery. Section III discusses central¬ 
ized M-SBL 0 adapted to our setup, and sets the stage for 
our proposed decentralized solution. Section [IV] develops the 
proposed CB-DSBL algorithm along with a detailed discussion 
on the convergence properties of the underlying ADMM itera¬ 
tions. Other implementation specihc issues are also discussed. 


Section [V] compares the performance of proposed algorithm 
with existing ones with respect to various performance metrics. 
Finally, section VI concludes the paper. 

Notation: Boldface lowercase and uppercase alphabets are 
used to denote vectors and matrices, respectively. Script styled 
alphabet (for example A) is used to denote a set. |^| denotes 
the cardinality of set A. The term x^(z) denotes the i* element 
of vector x associated with node Sj at A:* iteration/time index. 
The superscript (.)^ denotes the transpose operation. For 
matrices A and B of sizes mxn and pxq respectively, A(g)B 
denotes their Kronecker product, which is of size mp x nq. 
N{pi, S) denotes the Gaussian distribution with mean /r and 
covariance matrix S. E(x|y) denotes taking expectation of 
random variable x conditioned on another random variable y. 


IF Distributed JSM-2 System Model 

We consider a network of L nodes/sensors connected as 
a network described by a bi-directional graph Q — {J,A). 
J = {1,2, ...,L} is the set of vertices in Q, each vertex 
representing a node in the network. Set A contains the 
edges in Q, each edge representing a single hop error-free 
communication link between a distinct pair of nodes. Each 
node is interested in estimating an unknown fc-sparse vector 
Xj S M" from m locally acquired noisy linear measurements 
Yj G K™. The generative model of the local measurement 
vector Yj at node j is given by 

Vj = ^ 3^3 + Wj, 1 < i < F (1) 

where, G is a full rank sensing matrix and wj G 

is the measurement noise modeled as zero mean Gaussian 
distributed with covariance matrix rijlm- The sparse vectors 
Xi,X 2 ,...,Xi at different nodes follow the JSM-2 signal 
model 0. This implies that all xj share a common support, 
represented by the index set S. From the JSM-2 model, it also 
follows that the nonzero coefficients of the sparse vectors are 
independent within and across the vectors. 

The goal is to recover the local sparse vectors 
Xi,X 2 ,...,Xi at their respective nodes using decentral¬ 
ized processing. In addition to processing the local data 
, (t| }, each node must collaborate with its single hop 
neighboring nodes to exploit the network wide joint sparsity of 
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the unknown sparse vectors. For sake of privacy, the nodes are 
prohibited from directly exchanging their local measurements 
or local signal estimates. Finally, the decentralized algorithm 
should be able to generate the centralized solution at each 
node, as if each node has access to the entire global informa¬ 
tion i.e., 

III. Centralized Algorithm for JSM-2 

In this section, we briefly recall the centralized M-SBL 
algorithm IE for JSM-2 signal recovery and extend it to 
support distinct measurement matrices and noise variances 
(t| at each node. The centralized algorithm runs at an FC, 
which assumes complete knowledge of network wide informa¬ 
tion, {yj, For ease of notation, we introduce two 

variables X = {xi,X 2 ,... ,xl} and Y = {yi,y 2 ,.. ■ 
to be used in the sequel. 

Similar to M-SBL, each of the sparse vectors Xj ,j & J m 
assumed to be distributed according to a parameterized signal 
prior p(xj; 7 ) shown below. 

f(xj;7) = 


Further, the joint signal prior p(X; 7 ) is assumed to be given 
by 

P(X;7) = n 

In the above, 7 = ( 7 ( 0 ), 7 ( 1 ),..., 7 ( 74 ))^ is an n dimen¬ 
sional hyperparameter vector, whose entry, 7 ( 1 ), models 
the common variance of Xj(i) for 1 < j < L. Since the signal 
priors p(xj; 7 ) are parameterized by a common 7 , if 7 has a 
sparse support S, then the MAP estimates of Xi, X 2 ,..., x^ 
will also be jointly sparse with the same common support 
S. The Gaussian prior in promotes sparsity as it has 
an alternate interpretation as a parameterized model for the 
family of variational approximations to a sparsity inducing 
Student’s t-distributed prior | |24) . Under this interpretation, 
hnding the hyperparameter vector 7 which maximizes the 
likelihood p(Y; 7 ) is equivalent to hnding the variational 
approximation which has the largest Bayesian evidence. 

Let 7 ]y[L denote the maximum likelihood (ML) estimate of 
hyperparameters of the joint source prior; 

7ml = arg max p(Y; 7 ) (4) 

1 

where p{Y ; 7 ) is a type -2 likelihood function obtained by 
marginalizing the joint density p(Y, X; 7 ) with respect to the 
unknown vectors in X, i.e., 

f(Y; 7 ) = n / 

L 

= + . ( 5 ) 


n 

i=l 



Here T = diag( 7 ). We note that 7 jy[L cannot be derived 
in closed form by directly maximizing the likelihood in Q 
with respect to 7 . Hence, as suggested in the SBL framework 
1151, we use the expectation maximization (EM) procedure to 
maximize logp(Y; 7 ) by treating X as hidden variables. 

We now discuss the main steps of the EM algorithm to 
obtain 7 ]yiL- Let qg(X.) denote the variational approximation of 
true conditional density p(X|Y, 7 ) with variational parameter 
set 9 — The variational parameters and 

represent the conditional mean and covariance of Xj given yj. 
Then, as shown in | [25) , the log likelihood admits the following 
decomposition. 


logp(Y; 7 ) = / gg(X)log 


p(Y,X; 7 ) 


9s(X) 

D{qe{X) ||p(X|Y; 7 )) 


dX 


(6) 


where the term D{qg\\p) = f qe(X) log p(x|Y;^-y) the 
Kullback-Leibler (KL) divergence between the probability 
densities qg and p. Erom the non-negativity of D{qg\\p) 1^ , 
the log likelihood is lower bounded by the hrst term in the 
RHS. In the E-step, we choose 9 to make this variational lower 
bound tight by minimizing the KL divergence term. 


0'=+t = argmin Diqg{X) || p(X|Y, 7 '=)). (7) 

e 


Here, k denotes the iteration index of EM algorithm. Erom 
LMMSE theory, p(xjjyj, 7 ^) is Gaussian with mean 
and covariance given by 










, -1 




and = cTj 




By choosing 


,k+l v^+1 


( 8 ) 


}jGJ 


AC(xj;the KL divergence term in (7i can 


jej ]_ 

be driven to its minimum value of zero. 

In the M-step, we choose 7 to maximize the tight variational 
lower bound obtained in the E-step: 


7 '=+!= arg max [ qgk+i (X) log dX 

= arg max [logp(Y, X; 7 )]. (9) 

As shown in Appendix the optimization problem Q can 
be recast as the following minimization problem. 


7 '=+! = arg min ^ ^ ( log 7 ( 7 ) 

V 


S^(z,z) + /r^(z)^ 
7(z) 


( 10 ) 


Erom the zero gradient optimality condition in ( [T0| l, the M-step 
reduces to the following update rule: 


i) + for 1 < i < 7 z. 

jeJ 

( 11 ) 

By repeatedly iterating between the E-step (|^ and the M-step 
the EM algorithm converges to either a local maxima 
or a saddle point of logp(Y| 7 ) pT) . Once 7 jy[L is obtained, 
the MAP estimate of xj is evaluated by substituting it in 
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the expression for /r.^ in (j^. It is observed that when the 
EM algorithm converges, the 7 (i)’s belonging to the inactive 
support tend to zero, resulting in sparse MAP estimates. In 
practice, hard thresholding of 7 is required to identify the 
nonzero support set. In this work, we remove all coefficients 
from the active support set for which 7 ( 1 ), 1 < i < n is below 
the local noise variance. It must be noted that if the local noise 
variance at each node is unknown, it can be estimated along 
with 7 within the EM framework, as discussed in |fT4|. 


IV. Decentralized Algorithm eor JSM-2 


A. Algorithm Development 

In this section, we develop a decentralized version of the 
centralized algorithm discussed in the previous section. Eor 
notational convenience, we introduce an n length vector 
= {ai 1 .al 2 .---.alS maintained at node j, where 
i) + Sj and are as dehned in ([ 8 ]). 

Erom we observe that the solution of the M-step 

optimization ( [T^ can be interpreted as an average of the L 

vectors . The same solution can also be obtained 

i. 3 J 1=1 

by solving a different minimization problem 


j ^ = arg mm 


^fc+i I 




( 12 ) 


n II7- 

jeJ 

Unlike the non-convex M-step objective function in 
the surrogate objective function in ( [T 2 | is convex in 7 and 
therefore can be minimized in a distributed manner using 
powerful convex optimization techniques. An alternate form 
of ([T 2 ]i amenable to distributed optimization is given by 


mm 




,jej 


El 

jej 


h3-S'\ 


subject to 7 j = 7 y V j e J, j S Afj (13) 

where Afj denotes the set of single hop neighbors of node 
j. The equality constraints in ( [T3] l ensure its equivalence to 
the unconstrained optimization in ( |T^ . Here, the number of 
equality constraints is equal to |,A|, i.e., the total number of 
single hop links in the network. In a conventional decentralized 
implementation of 0 , the number of messages exchanged 
between the nodes grow linearly with the number of consensus 
constraints. By restricting the nodes to exchange information 
only through a relatively small set of pre-designated nodes 
called bridge nodes, the number of consensus constraints can 


be drastically reduced without affecting the equivalence of ( 121 
and ( [T3 ] i. Let B ^ J denote the set of all bridge nodes in the 
network and Bj C B denote the set of bridge nodes belonging 
to the single hop neighborhood of node j, then ( [T3] ) can be 
rewritten as 


minimize > 


17 


subject to 7 ^ = 7 j 




^j€j,b€B,. (14) 



Fig. 1: Selection of bridge nodes in a sample network consisting of 10 nodes. 
In the proposed scheme, only those edges that have at least one of the vertices 
as a bridge node are used for communication. The remaining edges are not 
used for communication. For example, node 9 communicates only with bridge 
nodes 4 and 8. 


impose network wide consensus allows us to trade off between 
the communication cost and robustness of the distributed 
optimization algorithm]^ 

The following Lemma provides sufficient conditions on the 
choice of the bridge node set B under which (12 1 and ([TO are 
equivalent. The proof for the Lemma can be found in ]23[. 


Lemma 1. For a connected graph Q, if the bridge node set 
B Q ff satisfies the following conditions 


1) Each node Sj must be connected to at least one bridge 
node in B, i.e., Bj f f for any j G J, and, 

2) If two nodes Sj^ and .Sj^ are single-hop neighbors, then 

4'for any ji, jz e J, 

then, in the solution to (74i, 7 T are equal for all j £ ff . 


Eig. [T] illustrates the selection of bridge nodes according to 
LemmaT] in a sample network. In this work, we employ the 
Alternating Directions Method of Multipliers (ADMM) algo¬ 
rithm | |29) to solve the convex optimization problem in ( [T^ . 
ADMM is the state of the art dual ascent algorithm for solving 
constrained convex optimization problems, offering a linear 
convergence rate and a natural extension to a decentralized 
implementation. 

We start by constructing an augmented Lagrangian, Lp, 
given by 




fc+l II2 I 




E E {S'^{^3 - ^ E E iit'j - (15) 

jeJbeBj 


where denotes the n x 1 sized Lagrange multiplier vector 
corresponding to the equality constraint 7 ^ = 7 ^ and p is a 
positive scalar which biases the quadratic consensus penalty 
term. For ease of notation, we dehne concatenated vectors 
Ij = {li.ll.---. and 7 g = { 7 ^^,..., 7 ^^^ to be 

used in the sequel. We also dehne the nNc x 1 concatenated 
Lagrange multiplier vector A, where Nc is the number of 
equality constraints in (|T4|). The solution to (|T^ is then 


The auxiliary variables 7 {„ called bridge parameters, are used 
to establish consensus among 7 ^ . Each bridge parameter 7 ^ 
is a non negative n length vector maintained by the bridge 
node b. As motivated in |[23j, p 8 ), using bridge nodes to 


^In an alternate embodiment of the proposed algorithm, the message 
exchanges could be restricted to occur only through the (trustworthy) bridge 
nodes, thereby avoiding direct communication between the nodes. In this case, 
the role of the bridge nodes could be to enforce consensus in -y across the 
nodes, and these nodes need not directly participate in signal reconstruction. 
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obtained by executing the following ADMM iterations until 
convergence; 



arg min 


(16) 





II 

arg min 

L,(7^+\7b,A’') 

(17) 




(A5)"+i 

= (A?)’ 

^+P(7E-7r') 

(18) 


Vj S J,b & Bj. Here, r denotes the ADMM iteration index. 
T6p7 1 , the primal variables, 'yj and 7 g, are updated 


In 


in a Gauss-Seidel fashion by minimizing the augmented La- 
grangian, Lp, evaluated at the previous estimate of the dual 
variable A. By adding an extra quadratic penalty term to the 
original Lagrangian, the objective in ([T^ is no longer affine in 
7 g and hence has a bounded minimize!'. The dual variable A is 
updated via a gradient-ascent step ( fTS) ! with a step-size equal 
to the ADMM parameter p. This particular choice of step-size 
ensures the dual feasibility of the iterates { 7 ^^, A^“''^} 

for all r. Since the augmented Lagrangian Lp is strictly convex 
with respect to 7 jr and 7 g individually, the zero gradient 
optimality conditions for and •EH translate into simple 
update equations for 7 ^ and 7 ^: 


2 a 


fc-i-i 


_ 
' i 


and 


7^^ 


2 -I- p\Bj \ 

pIM,I 


yj&J (19) 


VbGB. ( 20 ) 


Here A4 denotes the set of nodes connected to bridge node 
b. As shown in Appendix by eliminating the Lagrange 
multiplier terms from ( [T8] l and ( |20| i, the update rule for 7 ^ 
can be further simplihed to 

^ E ^bGB. (21) 


7^ = 




In section IV-F we compare the bridge node based ADMM 


discussed above with other decentralized optimization tech¬ 
niques available in the literature. We show empirically that 
the bridge node based ADMM scheme is able to flexibly 
trade off between communication complexity, robustness to 
node failures, speed of convergence, and signal reconstruction 
performance. 


B. CB-DSBL Algorithm 

We now propose the CB-DSBL algorithm. Essentially, it is 
a decentralized EM algorithm for hnding the ML estimate of 
the hyperparameters 7 . The algorithm comprises two nested 
loops. In the outer loop, each node performs the E-step (j^ in 
a standalone manner. In the inner loop, ADMM iterations are 
performed to solve the M-step optimization in a decentralized 
manner. Upon convergence of the outer loop, each node j & J 
has the same ML estimate of 7 , which is then used to obtain 
a MAP estimate of the local sparse vector x^, similar to the 
centralized algorithm. The steps of the CB-DSBL algorithm 
are detailed in Algorithm 1. 

Each ADMM iteration in the M-step of the CB-DSBL 
algorithm involves two rounds of communication (Steps 2 and 


Algorithm 1 Consensus Based Distributed Sparse 
Bayesian Learning (CB-DSBL) 


Initializations: fe ^ 0 

7|^10-3i„xi 

7 ^ (Aj)'' ^0 Vj€j,b€ Bj 


while {k < k max ) & (A7j- > e) do 
E step: Each node Sj, 


M step: r t— 0, 7 ^ 

while r < do 


■ 

tj' 


7b 


7s. 


(A)^ ^ (A)^ 


1. All nodes Sj^j update their local estimate of 


hyperparameters 7 ^ according to 1 19 1 . 


2. All nodes Sj^j transmit the updated 7 ^^^ esti¬ 
mate to connected bridge nodes 
Each bridge node Sb^B updates its bridge variable 
7 j according to l | 21 | . 

All bridge nodes SbeB transmit updated bridge 
hyperparameters 7 ^^^ to nodes in their neighbor¬ 
hood Mb- 

All nodes Sj^j update their Lag range multipliers 


3. 


4. 


5. 

6 . 

end 

7^^ 
k t— 


{XjY,b £ Bj according to 


r -I- 1 


7b 


I 81 . 


end 


-7j 
fc -I- 1 

A7j ^ ||7*r -7 j“^I|2 


7^, (A)'= ^ (A)’' 


4) between the nodes. In the first communication round, each 
node j G J transmits 7 ^ G M" to its \Bj \ single hop neighbors. 
In the second communication round, each bridge node b G B 
transmits 7 ^ G K" to its |A/&| single hop neighbors. Thus, 
in each M-step, numbers are exchanged 

between the nodes and their respective bridge nodes. In Eig.|^ 
we compare different variants of CB-DSBL with respect to 
the average number of inter-node message exchanges required 
to achieve less than 1% signal reconstruction error. Erom 
the figure, it is evident that the aforementioned bridge node 
based ADMM technique is effective in reducing the overall 
inter-node communication and the associated costs, without 
compromising on signal reconstruction performance. One of 
the ways of selecting the bridge node set B is to sort the 
nodes in decreasing order of their nodal degrees and retain the 
least number of top most \B\ nodes satisfying the conditions 
in Lemma [T] Although suboptimal, this scheme is able to 
significantly reduce the overall communication complexity 
of the algorithm as demonstrated empirically in Eig. In 
section |IV-D| a rule of thumb policy is discussed to select 
the bridge nodes B which will ensure fast convergence of the 
decentralized ADMM iterations in the M-step of CB-DSBL 
algorithm. 

Eurther reduction in inter-node communication is possible 
by executing only a finite number of ADMM iterations per 
M-step. In a practical embodiment of the algorithm, running 
a single ADMM iteration per M-step is sufficient for the 
CB-DSBL to converge. As shown in Eig. beyond two or 
three ADMM iterations per M-step, there is only a marginal 
improvement in the quality of solution as well the convergence 
speed. Eig. shows that even with a single ADMM iteration 
per M-step, CB-DSBL typically converges quite rapidly to the 
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Network size (L) 


Fig. 2: Comparison of the communication complexity of CB-DSBL variants 
based on ‘bridge node’ ADMM (^, CA-MoM (^, D-ADMM and 
EXTRA |32| algorithms. The plot shows the average number of messages 
exchanged between nodes in order to achieve less than 1% signal reconstruc¬ 
tion error (—20 dB NMSE), The total number of message exchanges shown 
here is averaged across 500 trials. Other simulation parameters: n = 50, 
m = 10, 10% sparsity, SNR = 30 dB. 


centralized solution. 




20 40 

Iterations 


k 



(c) L = 20 nodes, SNR = 10 dB 


(d) L = 20 nodes, SNR = 20 dB 


Fig. 4: Convergence of decentralized CB-DSBL to centralized M-SBL so¬ 
lution for different network sizes and SNRs. The CB-DSBL variant used 
here executes a single ADMM iteration per EM iteration. Other simulation 
parameters: n = 50, m = 10 and 10% sparsity. 



Fig. 3: This plot illustrates the sensitivity of CB-DSBL’s outer loop iterations 
to the number of ADMM iterations executed per M-step in the inner loop of 
the algorithm. Each point in the curve represents the average number of overall 
CB-DSBL iterations needed to achieve less than 1% signal reconstruction 
error for a given number of ADMM iterations executed in the inner loop. 
Simulation parameters used: n = 100, m = 10, L = 10, 5% sparsity, SNR 
= 30 dB and ^^trials = 100. 


C. Convergence of ADMM Iterations in the M-step 

In this section, we analyze the convergence of the ADMM 
iterations ([T§, ([Tg and © derived for the M-step opti¬ 
mization in CB-DSBL. By doing so, we aim to highlight the 
effects of the bridge node set B and the augmented Lagrangian 
parameter p on the convergence of the ADMM iterations. 

ADMM has been a very popular choice for solving both 
convex g, ©, ©, ©, © and more recently nonconvex 
optimization problems as well, in a distributed setup. In 
its classical form, ADMM solves the following constrained 
optimization problem; 

min /(x) -f g{z) 


subject to Ax -f- Bz = c, 


( 22 ) 


where x G K" and z G M"* are the primal variables. The 
matrices A, B and the vector c appearing in the linear equal¬ 
ity constraint are of appropriate dimensions. The functions 
/ : K" —>• K and g : M™ —M are convex with respect to 
X and z, respectively. In p5) , the authors have shown linear 
convergence rate for the classical ADMM iterations under 
the assumptions of strict convexity and Lipschitz gradient 
on one of / or g, along with full row rank assumptions 
for the matrix A. However, in the ADMM formulation of a 
decentralized consensus optimization problem, the coefficient 
matrix A is seldom of full row rank. In | |3^ , the full row rank 
condition of A was relaxed and linear rate of convergence was 
established for decentralized ADMM iterations for a generic 
convex optimization with linear consensus constraints similar 
to ( [T3| |. In 1371, the convergence of ADMM for solving 
an average consensus problem has been analyzed for both 
noiseless and noisy communication links. In both and 


1 371, the secondary primary variables indicated by the entries 
of z have a one to one correspondence with the communication 
links between the network nodes. However, such a bijection 
is missing for the bridge variables used in our work for 
enforcing consensus between the primal variables. Due to this, 
the convergence results of p^ , p7| are not directly applicable 
to our case. In the sequel, we present the analysis of the 
convergence of decentralized ADMM iterations for the bridge 
node internode communication scheme. 

In this section, we analyze the convergence of the ADMM 
iterations ©, © and ( |2T] i derived for the M-step optimiza¬ 
tion in CB-DSBL. By doing so, we aim to highlight the effects 
of the bridge node set B and the augmented Lagrangian param¬ 
eter p on the convergence of the ADMM iterations. We start 
by dehning block matrices Ei = Ci 01„ and E 2 = C 2 ® In 




























Fig. 5: Construction of block matrices Ei and E 2 for a sample 5 node 
network. The matrices Ei and E 2 are together used to enforce the linear 
consensus constraints in (ID. as shown in j23t . Notice the correspondence 
between the diagonal coefficients of E^Ei and the number of bridge node 
connections per node. 


of sizes nNc x nL and nNc x n\B\, respectively. The rows 
of Cl and C 2 encode the Nc equality constraints in ( [l4| ) 
such that if equality constraint is 7 ^ = 7 ;,^, bk G B, then 
Ci(i, j) = 1 and C 2 (i, k) = — 1 ; with the rest of the entries in 
the i* row being zero. It can easily be shown that the minimum 
and maximum number of bridge nodes connected to any node 
in the network is the same as the minimum and maximum 
eigenvalues of Ef^Ei, denoted by and respectively. 
Fig. [^illustrates the construction of the block matrices Ei and 
E 2 for an example network consisting of 5 nodes. Using the 
newly defined terms, the optimization problem in ( [T^ can be 
rewritten compactly as 


min fi’Jj) s.t. Ei 7 j + E 27 B = 0 (23) 


where / : —>■ K denotes the objective function in (14i, 

which depends only on jj. The augmented Lagrangian Lp 
corresponding to (| 2 ^ can also be rewritten compactly as 


= /(7j) + '’^^(Ei7j + E27b) 

+ f(E i7j + E27b)'^(Ei7j + E27b)- (24) 


By construction, the block matrix Ei has full column rank, 
as all its columns are mutually disjoint in support. However 
El can be row rank deficient due to repeated rows caused by 
a node being connected to multiple bridge nodes, which is 
often the case. Since the matrix Ei is row rank deficient, the 
ADMM convergence results of p5) are not applicable to ( |2^ . 
Theorem [T] below summarizes the convergence of the ADMM 
iterations pSj l. ( [T^ and ( |2T] i to their fixed point. The result in 
Theorem [U holds for any / that is strongly convex with strong 
convexity constant m/, and with an Mf Lipschitz continuous 
gradient. 


Theorem 1. Let { 7 ^, 7 g} and A* denote the unique primal 
and dual optimal solutions and vector u be constructed 

as u = [(E 27 b)^ (similarly for u’',u*). Then, it holds 

that 

I) The sequence is Q-linearl^convergent to u*, i.e., 

I1u^+'-u*||g<3^||u’--u*||g (25) 


A sequence : -2+ 


is said to be a O-linear ly co nvergent to L, if 


there exists fi E (0,1) such that lim 

k —>-oo 


^k+l-L\ 


n 


where 5 is evaluated as 


S = max < min 


2 m/ 


p.-l 




9 ’ 
max 


(26) 

2) The primal sequence 'fj is R-linearlj^convergent to 'y*j, 
i.e.. 

1 „ „ 


Il7j -lj*\\2< 


2 m/ 


|U - U IIg 


(27) 


where || • ||g A the weighted norm with 

respect to the diagonal matrix G = 

diag(p/„|B|,p"^/Arc). 


Proof. See Appendix jCj □ 

According to Theorem [^ the primal optimality gap || 7 j — 
7 j ||2 decays R-linearly with each ADMM iteration. More¬ 
over, since 'y*j is primal feasible, there is consensus among 
7 ^ , j G J upon convergence, implying that each node effec¬ 
tively minimizes the centralized M-step cost function in ([T0)l. 


D. Selection of the Augmented Lagrangian Parameter p 

From ( |25] l and in Theorem we observe that to 
optimize the decay of the primal optimality gap between 7 ^ 
and 7 j in each ADMM iteration, the augmented Lagrangian 
parameter p has to be chosen such that it maximizes 5 in 
( |26| ). Theorem [^ reveals the optimal value of p and the 
corresponding value of 5. 


Theorem 2. The optimal value of augmented Lagrangian 
parameter p which uniquely maximizes the 6 as defined in 
([26]) is given by 


Mf 

Popt — ~ ~ 

^ max^ min 


— 1 )^ + -f (n — 1 ) 

\J {pt— 1 )^ + — (n — 1 ) 


The corresponding maximal value of S is given by 
A - 2 

Oopt — 


-I- 1 -I- 1)^ + 4:KK‘jJ 


(28) 


(29) 


M f 

where Kf = —- represents the condition number of the 
rrif 

— ^2 


objective function in [14) and k = 


is the ratio of the 


maximum and minimum eigenvalues of Ej Ei. 


Proof. See Appendix]^ □ 

From ( |29l l, we observe that the convergence rate of the 
ADMM iteration in the M-step of CB-DSBL algorithm de¬ 
pends upon two factors: k and Kf. k close to its minimum 
value of unity results in faster convergence of the ADMM 

(7^ 

iterations. Since the ratio tt = —^ is also equal to the ratio 
of maximum and minimum numfier of bridge nodes per node 


^ A sequence '■ ^ R is said to be R-linearly convergent to L, if 

there exists Q-linearly convergent sequence which converges to zero such 

that lim \xj^ — ^ Vk- 

k —^00 
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■A-SNR = 20dB,L=10 
.-SNR = 20dB, L = 20 
■O-SNR=15dB,L=10 
HSNR=15dB, L = 20 
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Fig. 6: Left and right plots show the sensitivity of the number of iterations 
required for convergence and NMSE respectively with respect to the ADMM 
parameter p. The scale factor p = 1 corresponds to popt in j28| . 


in the network, a rule of thumb for bridge node selection 
would be to ensure that each node is connected to more or 
less the same number of bridge nodes. The convergence rate 
also depends upon Kf, the parameter that is dependent on 
how well conditioned the function / is. For the case where 
/ is the objective function in ( |T4| , it is easy to show that 
ruf = Mf = 2 and k/ = 1. Thus, specific to CB-DSBL, 
the optimal ADMM parameter p is given by popt = 

*^min 

and the corresponding (5opt = For a given network 

connectivity graph Q, this popt can be computed off-line and 
programmed in each node. As shown in Fig. the average 
MSE and mean number of iterations vary widely with p, an 
inappropriate choice of p resulting in slow convergence and 
poor reconstruction performance. Also, the popt computed in 
( |28| l is very close to the p that results in both the fastest 
convergence as well as the lowest average MSE. 

E. Computational Complexity of CB-DSBL 

In this section, we discuss the computational complexity 
of the steps involved in a single iteration of the CB-DSBL 
algorithm. The local E-step requires 0{nf + nrn? -f mf) 
elementary operations at each node. The M-step is executed 
as multiple (say, r-max) ADMM iterations. A single ADMM it¬ 
eration involves updating of the local hyperparameter estimate 
7 ^ and Lagrange multipliers, which takes 0((n) computations 
per node, ( being the highest number of bridge nodes assigned 
per node in the network. Eurther, each bridge node b G B has 
to perform an additional 0((n) computations to update the 
local bridge parameters 7 ^ in every ADMM iteration. Thus, 
the overall computational complexity of a single CB-DSBL 
algorithm at each node is 0{rf nmf -f -f C’^^max). and, 
as desired, it does not scale with L, i.e., the total number of 
nodes in the network. 


F. Other CB-DSBL Variants 


There are several alternatives to the aforementioned bridge 
node based ADMM technique that could potentially be used 
to solve the M-step optimization in ( [T3] l. In this section, 
we present empirical results comparing the performance and 
communication complexity of four different variations of the 
proposed CB-DSBL algorithm based on (i) bridge node based 


ADMM (ii) Distributed ADMM (D-ADMM) (iii) 
Consensus averaging Method of Multipliers (CA-MoM) pOj , 
and (iv) EXact firsT ordeR Algorithm (EXTRA) p^. Each 


of these decentralized algorithms is endowed with at least 
O(^) convergence rate, where k stands for the iteration count. 
Besides these four, there are proximal gradient based methods 
1381, p9) relying on Nesterov-type acceleration techniques 
which also offer linear convergence rates. However, these 
algorithms require the objective function to be bounded and 
involve multiple communication rounds per iteration, which 
is of major concern in our work. As shown in Eig. the 
proposed CB-DSBL variant relying on the bridge node based 
ADMM scheme is the most communication efficient one. 


G. Implementation Issues 

CB-DSBL algorithm can be seen as a decentralized EM 
algorithm to find the ML estimate of the hyperparameters 
7 of a sparsity inducing prior. CB-DSBL, not surprisingly, 
also inherits the tendency of the EM algorithm to converge 
to one of the multiple local maxima of the ML cost function 
logp(Y| 7 ). However, getting trapped in a local maximum is 
not a problem, as it has been shown in HD that all local 
maxima of the logp(Y| 7 ) are at most m-sparse and hence 
qualify as reasonably good solutions to our original sparse 
model estimation problem. Despite this, it is recommended to 
seed the EM algorithm with 7 whose all entries are close to 
zero. 

Another common issue is that of the wide variation in 
the energy of the nonzero entries of Xj across the network. 
Specifically, in distributed event classification by a multitude 
of different types of sensors ||^, each sensor node may employ 
its own distinct sensing modality and hence may perceive a 
different SNR. In such cases, a preconditioning step which 
normalizes the local response vector to unit energy is recom¬ 
mended for fast convergence of the CB-DSBL algorithm. The 
local sparse signal estimates can be re-adjusted in the end to 
undo the pre-conditioning. 


V. Simulation Results 

In this section, we present simulation results to examine 
the performance and complexity aspects of the proposed CB- 
DSBL algorithm when compared with existing decentralized 
algorithms; DRL-1 Q, DCOMP |Tg and DCSP fT^. The 
centralized M-SBL |14| is also included in the study as a 
performance benchmark for the proposed decentralized algo¬ 
rithm. The CB-DSBL variant considered here executes two 
ADMM iterations in the inner loop for every EM iteration 
in the outer loop. The value of the augmented Lagrangian 
parameter, p, is chosen according to ( [28] l. Lor each experiment, 
the set B of bridge nodes is selected as described in section 
IV-Bl The local measurement matrices are chosen to 


be Gaussian random matrices with normalized columns. The 
nonzero signal coefficients are sampled independently from 
the Rademacher distribution, unless mentioned otherwise. Lor 
each trial, the connections between the nodes are assumed 
according to a randomly generated Erdos-Renyi graph with a 
node connection probability of 0.8. In the final step of M-SBL 
and CB-DSBL algorithms, the active support is identified by 
element-wise thresholding the local hyperparameter vector 7 ^ 
at node j using the threshold 4 (t|, where crj denotes the local 
measurement noise variance. 
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A. Performance versus SNR 

In the first set of experiments, we compare the normalized 
mean squared error (NMSE) and the normalized support error 
rate (NSER) of different algorithms for a range of SNRs. The 
support-aware LMMSE estimator sets the MSE performance 
benchmark for all the support agnostic algorithms considered 
here. The NMSE and NSER error metrics are defined as 


NMSE = - V 
L 

f=i 


f IP 
S 112 


112 


NSER = -- V 
L ^ 

i=i 


\S\Sf + \s,\s\ 
\S\ 


where S is the true common support and Sj is the support 
estimated at node j. The network size is fixed to L = 10 
nodes. As seen in Eig.|^ CB-DSBL matches the performance 
of centralized M-SBL in all cases. Eor higher SNR (> 15 
dB), it can be seen that both M-SBL and proposed CB-DSBL 
are MSE optimal. CB-DSBL also outperforms DRL-1 and 
DCOMP in terms of both MSE and support recovery. This is 
attributed to the fact that the Gaussian prior used in CB-DSBL 
with its alternate interpretation as a variational approximation 
to the Student’s t-distribution is more capable of inducing 
sparsity in comparison to the sum-log-sum penalty used in 
DRL-l. The poor performance of DCOMP is primarily due 
to its sequential approach towards support recovery which 
prevents any corrections to be applied to the support estimate 
at each step of the algorithm. Contrary to |Tg, DCSP fails 
to perform better than DCOMP. This is because DCSP works 
only when the number of measurements exceeds 2k, where k 
is the size of the nonzero support. 





(b) Nonzero coefficients drawn from Gaussian distribution 


Fig. 7: Left and right figures in the above plot the NMSE and NSER 
respectively for different SNRs. Other simulation parameters: L = 10 nodes, 
n = 50, m = 10 and 10% sparsity. 


B. Tradeoff between Measurement Rate and Network Size 
In the second set of experiments, we characterize the NMSE 
phase transition of the different algorithms in {m/n) — L plane 
to identify the minimum measurement rate {min) needed to 
ensure less than 1% signal reconstruction error (or, NMSE 
< — 20 dB), for different network sizes (L), and a hxed spar¬ 
sity rate {k/n = 0.1). As shown in Fig.[^ for the same network 
size, CB-DSBL is able to successfully recover the unknown 
signals at a much lower measurement rate compared to DRL-1, 
DCOMP and DCSP. This plot brings out the signihcant benefit 
of using collaboration between nodes and taking advantage of 
the JSM-2 model in reducing the number of measurements 
required per node for successful signal recovery. Additionally, 
as the network grows in size, the complexity of the local 
computations at each node also reduces with the number of 
local measurements (see section |rV-E|i. 



Fig. 8: NMSE phase transition plots of different algorithms illustrating the 
dependence of minimum measurement rate required to guarantee less than 
1% signal reconstruction error on the network size, for signal sparsity rate 
fixed at 10%. Other simulation parameters: n = 50 and SNR = 30 dB. 


C. Performance versus Measurement Rate (™) 

In the third set of experiments, we compare the algorithms 
with respect to their ability to recover the exact support 
for different undersampling ratios. As seen in Fig. for a 
similar network size, CB-DSBL is able to exploit the joint 
sparsity structure better than DCOMP, DCSP and DRL-1, 
and can correctly recover the support from significantly fewer 
number of measurements per node. Once again, CB-DSBL 
has identical support recovery performance as the centralized 
M-SBL, which was one of our design goals. 


D. Phase Transition Characteristics 

In these set of experiments, we compare the phase transition 
behavior of different algorithms under NMSE and support 
recovery based pass/fail criteria. Fig. |10a| plots the MSE phase 
transition of different algorithms where any point below the 
phase transition curve represents a sparsity rate {k/n) and 
measurement rate {m/n) tuple which results in an NMSE 
smaller than —20 dB corresponding to smaller than 1 per¬ 
cent signal reconstruction error. Likewise, in Fig. 10b points 
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Fig. 9: Probability of exact support recovery versus number of measurements. 
Simulation parameters: n = 50, 10% sparsity, SNR = 15 dB and L = 10 
nodes. 

below the support recovery phase transition curve represent 
(/c/n,TO/n) tuples which result in more than 90 percent 
accurate nonzero support reconstruction across all the nodes. 
Again, we see that the CB-DSBL and centralized M-SBL have 
identical performance and both are capable of signal recon¬ 
struction from considerably fewer measurements compared to 
DRL-1, DCOMP and DCSP. 

E. Tradeoff between Number of Bridge Nodes and Robustness 
to Node Failures 

In the final set of experiments, we demonstrate empirically 
that increasing the number of bridge nodes in the CB-DSBL 
algorithm makes it more robust to random node failures. As 
shown in Fig. [m by gradually increasing the density of bridge 
nodes in the network, the CB-DSBL algorithm is able to 
tolerate higher rates of node failures without compromising 
on signal reconstruction performance. More interestingly, only 
a relatively small fraction of nodes need to be bridge nodes 
(< 10% of the total network size) to ensure that CB-DSBL 
operates robustly in the face of random node failures. 

VI. Conclusions 

In this paper, we proposed a novel iterative Bayesian 
algorithm called CB-DSBL for decentralized estimation of 
joint-sparse signals by multiple nodes in a network. The CB- 
DSBL algorithm employs ADMM based decentralized EM 
procedure to efficiently learn the parameters of a joint sparsity 
inducing signal prior which is shared by all the nodes, and is 
subsequently used in the MAP estimation of the local signals. 
The CB-DSBL algorithm is well suited for applications where 
the privacy of the signal coefficients is important, as there is no 
direct exchange of either measurements or signal coefficients 
between the nodes. Experimental results showed that CB- 
DSBL outperforms existing decentralized algorithms; DRL- 
1, DCOMP and DCSP, in terms of both NMSE as well 
as support recovery performance. We also established R- 
linear convergence of the underlying decentralized ADMM 




(b) Support recovery phase transition 

Fig. 10: Phase transition plots for the different joint-sparse signal recovery 
algorithms. For all points on or below the NMSE phase transition curve, at 
most 1% average signal reconstruction error is incurred by the respective 
algorithm. Likewise, for all points on or below the support recovery phase 
transition curve, at least 90% of the nonzero support is successfully identified 
at all the network nodes. Other simulation parameters: n = 50, L = 5 nodes, 
SNR = 30 dB and number of trials = 200. 

iterations. The amount of inter-node communication during 
the ADMM iterations is controlled by restricting each node to 
exchange information with only a small subset of its single 
hop neighbors. Eor this inter-node communication scheme 
the ADMM convergence results presented here are applicable 
to any consensus driven optimization of a convex objective 
function. Euture extensions of this work could encompass 
exploiting any inter vector correlation between the jointly 
sparse signals. Also, it would be interesting to analyze the 
convergence of CB-DSBL algorithm in the presence of noisy 
communication links between nodes and under asynchronous 
network operation. 

Appendix 

A. Derivation of the M-step Cost Function 

The conditional expectation in <0 can be simplified as 
shown below. 

Ex [logp(Y,X; 7 )|Y; 7 '=] 
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Fig. 11: Plot illustrating the trade off between the density of bridge nodes and 
the robustness of the proposed CB-DSBL algorithm to random node failures. 
For a given fraction of bridge nodes (no. of bridge nodes / L), each point on 
the curve represents the average node failure rate that can be tolerated by CB- 
DSBL while still achieving less than 1% signal reconstruction error (< —20 
dB NMSE). The higher the number of bridge nodes, the more tolerant the 
network is to random node failures. 


= ^[xiYi-y"] [logp(Y|X) + logp(X; 7 )] 

= E[x|Y; 7 '=] logp(Y|X) + logp(xj;7)- 




(30) 

Using and discarding the terms independent of 7 in ( [30| , 
the M-step objective function Q( 7 | 7 ^) is given by 


Qilh ) — ^ ®[xj|yj, 7 '=] 

jeJ 


-^iog|r| - ixjr % 


= (logiri + X^ 




jej 


i=l 


7(i) 




jej i=i 


7(*) 


B. Derivation of the Simplified Update for 7 ^ 

By summing the dual variable update rule ( fTS] ) across all 
nodes, the following holds for all 6 G S 


E = E E - pWb\YY- ( 32 ) 

ieVb iGA/'i jGA/'b 

Plugging ( [20| in ( [32] i, we obtain 

Y = 0 V 6 G R (33) 

ieVi, 


Using (331 in (20i, we obtain the simplified update for 7 ^. 


C. Proof of Theorem 1 

The proof of the convergence of ADMM discussed in 
the sequel is a based on the proof given in p^ . However, 
our proof differs from the one in due to the different 
scheme adopted here, which uses the auxiliary/bridge nodes to 


enforce consensus between the nodes. We make the following 
assumptions about the objective function / in (231. 

1) / is twice differentiable and strongly convex in 7 This 
implies that there exists to/ G K+\{0} such that, for all 
7 j, 7 j, the following holds 


(34) 

2) V/ is Lipschitz continuous, i.e., there exists a positive 
scalar M/ such that, for all 7 j-, 7 j, we have 


l|V/(7j) - ^fh'j)\\2 < MfW-fj - (35) 


Let r denote the ADMM iteration count. From the zero 
subgradient optimality conditions corresponding to (|T^ and 
( [T7] ), we have 

V/(7:7+')^ + Ef A’' + pEfEi7y+1 + pEfE27^ = 0 (36) 
E^ A" + p-El E27b+^ + pEf Ei7^+i = 0. (37) 


From the dual variable update equation, we have, 

A"+i = A’' + + E27 b+')- (38) 

Premultiplying (B^ with E^f and E^ followed by its summa¬ 
tion to (|g and p7|i respectively gives 

+ EfA'+i + pE^’E2(7^ - 7 b+') = 0. (39) 

E^A’'+^=0. (40) 


By initializing A equal to zero, A*^ always lies in the nullspace 
A/^(E^), physically implying that the sum of the Lagrange 
multipliers of nodes connected to a given bridge node is always 
equal to zero. Let us assume 7 ^ —>■ jj, 7 ^ —)■ 7 ^ and 
A*^ —A* as r —> 00 , then putting r —00 in (38 1 , (391 and 
( |40| ) gives 


v/(7: 


j) 


EfA* = 0 


E^ A* = 0 
Ei7y + E27 b = 0 - 


(41) 

(42) 

(43) 


Note that the condition (43 1 implies consensus among 7 ,,/ G 
J, upon convergence. By subtracting (41 1 , (42 1 and ( [43] l from 
(391, (40 1 and (p 8 |, respectively, we get the desired difference 


(44) 

(45) 


terms needed for showing convergence results. 

+ Er(A^+i - A*) 

+pElE2iYB-YY) = o 


E^ (A’'+' - A*) = 0 


A’'^^ - A’- = pEi(7^+^ - Yj) + - 7b)- (46) 

Premultiplying (46i with E|^ and using (40i, we obtain, 

E^E^iYY - 7^) = -EJE2(7^+i - 7b)- (47) 

Further from strong convexity of / and using ( [44| , we can 
write. 


^f\YY-Yj\\l < (ELY - y+^),YY 
+ p(EfE2(7L'-7^),(7L'- 
= ((A*-A’'+1 ),Ei(7L'-7») 


-Yj) 

Yj)) 
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+ p((7b+'-7^),E^Ei(7^+1-7^)) 

= ((A*-A’'+1),Ei(7^/1-7^)) 

- p((7b+'-7^),E^E2(7^+1-7^)) 

= ((A* - A’'+i), - V) - E2(7^+' - 7^)) 

- p((7B+'-7^),Ei’E2(7^+i-7^)) 

= ^((A*-A’'+i),(A’'+i-A’’)) 

P 

+ p(E2(7^+^-7b),E2(7^-7^+')). (48) 


Here, the first identity is obtained by using a property of 
the inner product. The second, third and fourth identities are 
obtained by using (47 1 , (|46]l and (451 respectively. By dehning 


u = [(E 27 b) 
a matrix norm ||u' — u 
where G is given by 


, the RHS in (48 i can be expressed as 
’'+111^ = (u’--u’’+i)^G(u'-+i-u*), 


G = 



0 
1 T 

p^Nc 


Using the identity: 

2 (u’' - u’'+i)'^G(u’’+i - u*) = 

Hu’- - u*||^ - ||u’-+i - u*||^ - Hu’- - u’-+i||^, (49) 


the inequality in (j4^ can be rewritten as 

mf\hT-7j\\l< 

1 (IIn’- - u*11^ - I|u’-+i - u*IP g - Ilu’- - u'-+i11^) .(50) 

By discarding the non-positive terms in the LHS of ( [50| l, we 
obtain the following upper bound on the primal optimality gap. 


117^-7^112 < 


2m f 


\l- 


(51) 


In Appendix we prove the monotonic convergence of u’' 
to u*. Thus, from the monotonic decay of the RHS in (51 1 , 
we have R-linear convergence of 7 j to 7 j. 


D. Proof of monotonic convergence of u'’ to u* 

In order to prove monotonic convergence of u’' to u*, it is 
sufficient to show that there exists a (5 > 0 such that 

1 ,, , 


|u’'+i-u*||^ < 


l + i5' 


By rearranging the terms in (50 1 , we have 


|u'-+i-u*|rG < 


t* 112 


r+1 


U — U — U ' — U 




(52) 


(53) 


By comparing the terms in ( [5^ and ( [53| ), it is easy to see that 
if 


2m/||7j+'-7j||^ + ||u’'+^-uni^ > 5||u’'+i-u*||^, (54) 
or equivalently, 

2m/||7^+i-7^112+ pI|E2(7b+'-7^)11^ 

-h-||A’'+i-AUlo > 

p" 


^ (^p||E2(7r -7 b)II^ + >(55) 

then, we have monotonic convergence of u*" to u*. We 
now proceed to derive upper bounds for the RHS terms 
I|E 2 ( 7 b''^ — 7 b )||2 and ||A’'~''^ — A *||2 in terms of the LHS 
terms. These upper bounds will be used in the sequel to 


establish the inequality in (55 i. 


An upper bound for p||E 2 ( 7 b''"^ ~ 7e)ll2 

Note that for any two vectors a, b and a scalar p > 1 


12 ^ (1 ~ /^)ll^ll 2 + ( 1 

P 


(56) 


Applying inequality 
upper bound. 


p||E2(7^+'-76)IIB 


.M-ly p 
+ (mpctLx(Ei)) - ijWl- 
Here, crmax(Ei) is the largest singular value of Ei. 


to (46 1 , we get the following 

^ ViiA’'+i-Ani^ 

1 / p 

(57) 


• An upper bound for 4||A’'~'’^ — A *||2 
Similar application of inequality ( [56] l to ( |44| ) results in an 
upper bound for 4||A’^~'’^ — ^^*112 as shown below. 


|E^ (A’'+^ - A 


.*M |2 

..r-t-lNT 


12 — 




i/||pEiE2(7b -7b 


r+l\||2 


|A’-+ 1 -A *||2 < 


, '^PCrLx(El) 


^L(Ei) 


|E2(7^-7b 


r+lM |2 

2 - 


(58) 


From Lipschitz continuity of V/ 
following modihed upper bound. 


1 , 


7||A"+^ - A* 


I 2 < 


vM'j 




, we obtain the 


J^PCrmax(El) , 


|E2(7b - 7b 


r+l^i|| 2 _ 


(59) 


Cr2. (El) 

Here, crniin(Ei) denotes the smallest singular value of Ei 
and i/ is a positive scalar greater than unity. 

By summing the upper bounds in (|57|l and (|59ll, we get 


p||E2(7r-7B)ll^ + -||A’'+^-A*||^ < 

i(2m^||7^+i-7jll2 + p||E2(7"B-7B 


r+l'^||2 
2 



+ -i|A"+i-Ani? 


vM 


(60) 


p(.-l).i(E,) +FP^n.ax(El) ^ 

2 m/ 


p - 1 


( 61 ) 
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Thus, for S as defined above, the inequality (55i holds and 
consequently the inequality (52 1 also holds, thereby establish¬ 
ing the Q-linear convergence of the sequence to u*. 


E. Proof of Theorem 

Let (5opt denote the maximum value of 5 for any p > 0. 
Then, we can write 

(Jopt = max ( max (min v, p), f 2 {v), h{p,))) 

P>0 

= max (max(min(/i(p,i/,p),/2(j^),/3(/i))) ) (62) 

y p>0 J 

where the scalar functions /i, /2 and /a represent the three 
terms inside the minimum operator in ( [26l l. The following two 
Lemmas summarize the optimization of 6 in ( [62| . 

Lemma 2. 5opt = max Unin (/i(p, i'), / 2 (j^), fsip)) I where, 

= max flip,, v,p). 

p>0 

Proof. See Appendix]^ □ 

Lemma 3. There exists a unique (p, v) = (p*, v *) which 
simultaneously satisfies 

U fi = f2 = fs 
2) p > l,v > 1. 

Further, such a ip*,v*) maximizes gip,v) = 

min {hip.v), f2iv), hip)) over > 1. 

Proof See Appendix [G| □ 

The scalar function fi in Lemma |2| is maximized at 
Mf I 1/ - Mf I L 

P= - - — J—, - 7 T to give /i =- 1 

CTmaxCTmin \ Pi^ - 1) CTminO-max V “ h 

Further, by solving for the unique tuple ip*, v*) which satisfies 
the two optimality conditions specified in Lemma the 
optimal augmented Lagrangian parameter p and corresponding 
optimal 5 can be shown to be equal to the popt (5(,pt as 
defined in Theorem |2] 


G. Proof of Lemma 

In order to prove the Lemma, we claim the following. 

a) For any e > 0, there exist positive constants and B,j 
such that gip, v) < e when either p > or v > B^, 
holds. 

b) Any points (p, v) which satisfies condition 2 but does not 
satisfy condition 1 cannot be a local maximum of g. 

Note that claim (a) holds trivially for = 

—. In order to verify claim (b), let us consider a point (poi ^'o) 
which satisfies condition 2, but not condition 1. Then, we need 
to consider three cases. 

• Case-I: /i, /2 and /3 are distinct at (pois'd)- Without 
loss of generality, let g = fi at ipQji/o). Then, from 
the continuity of fi,f 2 ,h, there exists an e (> 0 ) ball 
Be, centered at (po,r'o) and with radius e inside which 
g = fi holds. Since, inside B^, g is strictly monotonic 
with respect to p and v, there exists (p, v) G such 
that gip,v) > p(po,r'o)- Hence, ipQ,vf) is not a local 
maximum. 

• Case-II: At (po, i^o), any two of /i, h and h ^6 equal 
and strictly greater than the remaining one. The same 
arguments as Case-I apply here as well. 

• Case-Ill:At (po,r'o), any two of /i, /2 and h ^6 equal 
and strictly less than the remaining one. WLOG, let 
fi = h < h- Let Cip,v) denote the continuous curve 
in ip,v) plane whose each point satisfies fi = h- 
Clearly, (po,r'o) also lies on the curve C. Moreover, 
there are are an uncountably infinite number of points 
of C inside Bg, with Bg defined as in Case-I. Due to the 
monotonicity of g along C, there exists ip, v) G Bg such 
that gip,v) > gipn,vf). Hence, ipQ,vf) is not a local 
maximum. 

From claim (a) and the fact that at the boundary points 
(p = 1 or j/ = 1 ), the objective g evaluates to zero, we 
may restrict our search for the global maximizer of g to set 
T) = {ip,v) I 1 < p < B^, 1 < 12 < Bfi\. Then, from claim 
(b), uniqueness of ip*, 12 *) G D and Weierstrass theorem, it 
follows that ip*,v*) is indeed the unique global maximizer 
of the continuous function g. Thus, the proof is complete. 


F. Proof of Lemma 

Let p = arg max /i. Then, by restricting the feasible set in 


p>0 


, we have. 


<5opt > max 
1 


= max 

1 


max {min(/i(p,^,p),/ 2 (z/),/ 3 (p))} 

P^PlX,12 

|min (^/i(p,z/),/ 2 (^),/ 3 (p))} . (63) 


On the other hand, from (62 1 and using fi > fi, we have. 


(5opt = max 


max {min (/i(p, 12 , p), / 2 (j^), hip))} 

p>0 


I min (^fiip,’^),hii'),hip))Y (64) 

Combining (63 1 and (64 1 establishes Lemma 


< max 
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